Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ard_categorical_max() #244

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Add ard_categorical_max() #244

wants to merge 14 commits into from

Conversation

edelarua
Copy link
Contributor

@edelarua edelarua commented Nov 25, 2024

What changes are proposed in this pull request?

Closes #240


Pre-review Checklist (if item does not apply, mark is as complete)

  • All GitHub Action workflows pass with a ✅
  • PR branch has pulled the most recent updates from master branch: usethis::pr_merge_main()
  • If a bug was fixed, a unit test was added.
  • If a new ard_*() function was added, it passes the ARD structural checks from cards::check_ard_structure().
  • If a new ard_*() function was added, set_cli_abort_call() has been set.
  • If a new ard_*() function was added and it depends on another package (such as, broom), is_pkg_installed("broom") has been set in the function call and the following added to the roxygen comments: @examplesIf do.call(asNamespace("cardx")$is_pkg_installed, list(pkg = "broom""))
  • Code coverage is suitable for any new functions/features (generally, 100% coverage for new code): devtools::test_coverage()

Reviewer Checklist (if item does not apply, mark is as complete)

  • If a bug was fixed, a unit test was added.
  • Code coverage is suitable for any new functions/features: devtools::test_coverage()

When the branch is ready to be merged:

  • Update NEWS.md with the changes from this pull request under the heading "# cardx (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
  • All GitHub Action workflows pass with a ✅
  • Approve Pull Request
  • Merge the PR. Please use "Squash and merge" or "Rebase and merge".

@edelarua edelarua added the sme label Nov 25, 2024
Copy link
Contributor

github-actions bot commented Nov 25, 2024

Unit Tests Summary

  1 files  169 suites   1m 15s ⏱️
167 tests 167 ✅ 0 💤 0 ❌
695 runs  695 ✅ 0 💤 0 ❌

Results for commit 2f14d54.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Nov 25, 2024

Unit Test Performance Difference

Test Suite $Status$ Time on main $±Time$ $±Tests$ $±Skipped$ $±Failures$ $±Errors$
ard_categorical_max 👶 $+0.00$ $+1$ $0$ $0$ $0$
Additional test case details
Test Suite $Status$ Time on main $±Time$ Test Case
ard_categorical.survey.design 💚 $15.43$ $-1.83$ ard_categorical.survey.design_works
ard_categorical_max 👶 $+0.00$ ard_categorical_max_errors_with_incomplete_factor_columns
ard_categorical_max 👶 $+0.00$ ard_categorical_max_follows_ard_structure
ard_categorical_max 👶 $+0.00$ ard_categorical_max_quiet_works
ard_categorical_max 👶 $+0.00$ ard_categorical_max_statistic_works
ard_categorical_max 👶 $+2.85$ ard_categorical_max_works_with_default_settings
ard_categorical_max 👶 $+0.01$ ard_categorical_max_works_with_pre_ordered_factor_variables
ard_categorical_max 👶 $+0.00$ ard_categorical_max_works_without_any_variables
ard_continuous.survey.design 💔 $16.25$ $+1.65$ unstratified_ard_continuous.survey.design_works

Results for commit aa39b27

♻️ This comment has been updated with latest results.

Copy link
Contributor

badge

Code Coverage Summary

Filename                                Stmts    Miss  Cover    Missing
------------------------------------  -------  ------  -------  -----------------------------------
R/add_total_n.survey.design.R              12       0  100.00%
R/ard_aod_wald_test.R                      77       8  89.61%   38-43, 93, 96
R/ard_attributes.survey.design.R            2       0  100.00%
R/ard_car_anova.R                          45       2  95.56%   62, 65
R/ard_car_vif.R                            62       1  98.39%   87
R/ard_categorical_ci.R                     96       1  98.96%   83
R/ard_categorical_ci.survey.design.R      129       1  99.22%   180
R/ard_categorical.survey.design.R         392       8  97.96%   77, 227-230, 274, 516, 530
R/ard_continuous_ci.R                      28       1  96.43%   38
R/ard_continuous_ci.survey.design.R       138       0  100.00%
R/ard_continuous.survey.design.R          274      14  94.89%   86, 177, 187, 338, 369-370, 418-426
R/ard_dichotomous.survey.design.R          73       3  95.89%   51, 156, 161
R/ard_effectsize_cohens_d.R               103       2  98.06%   69, 122
R/ard_effectsize_hedges_g.R                91       2  97.80%   68, 120
R/ard_emmeans_mean_difference.R            70       0  100.00%
R/ard_event_rates.R                        76      16  78.95%   72-75, 81, 113-116, 127-133
R/ard_missing.survey.design.R              79       1  98.73%   52
R/ard_regression_basic.R                   16       1  93.75%   46
R/ard_regression.R                         73       0  100.00%
R/ard_smd_smd.R                            69       5  92.75%   57, 83-86
R/ard_stats_anova.R                        95       0  100.00%
R/ard_stats_aov.R                          46       0  100.00%
R/ard_stats_chisq_test.R                   40       1  97.50%   39
R/ard_stats_fisher_test.R                  43       1  97.67%   42
R/ard_stats_kruskal_test.R                 36       1  97.22%   38
R/ard_stats_mcnemar_test.R                 80       2  97.50%   63, 106
R/ard_stats_mood_test.R                    49       1  97.96%   45
R/ard_stats_oneway_test.R                  39       0  100.00%
R/ard_stats_poisson_test.R                 76       1  98.68%   59
R/ard_stats_prop_test.R                    85       1  98.82%   43
R/ard_stats_t_test_onesample.R             41       0  100.00%
R/ard_stats_t_test.R                      112       2  98.21%   65, 111
R/ard_stats_wilcox_test_onesample.R        42       0  100.00%
R/ard_stats_wilcox_test.R                  99       2  97.98%   65, 117
R/ard_survey_svychisq.R                    38       1  97.37%   44
R/ard_survey_svyranktest.R                 54       1  98.15%   44
R/ard_survey_svyttest.R                    53       1  98.11%   42
R/ard_survival_survdiff.R                  89       0  100.00%
R/ard_survival_survfit_diff.R              76       0  100.00%
R/ard_survival_survfit.R                  197       5  97.46%   211-215
R/construction_helpers.R                  106      10  90.57%   160-175, 189, 248
R/proportion_ci.R                         195       1  99.49%   454
TOTAL                                    3596      97  97.30%

Diff against main

Filename               Stmts    Miss  Cover
-------------------  -------  ------  -------
R/ard_event_rates.R      +76     +16  +78.95%
TOTAL                    +76     +16  -0.40%

Results for commit: c023a1a

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

@ddsjoberg
Copy link
Collaborator

Thanks @edelarua ! Can you help me understand the differences between this and the hierarchical function? Also, when we refer to ordered, does that mean ordered factors?

@edelarua
Copy link
Contributor Author

Thanks @edelarua ! Can you help me understand the differences between this and the hierarchical function? Also, when we refer to ordered, does that mean ordered factors?

This performs similarly to the hierarchical function but doesn't use hierarchies, so calculates event rates for flat tables. The ordered argument is used to count "ordered" variables by highest level (i.e. grade/severity) which the hierarchy function is not able to do unless there is more than one level in the hierarchy since it has to use a workaround that moves the analysis variable to by right now.

For example, the grade rows in this table could be calculated with ard_event_rates(variables = AESEV, ordered = TRUE) but not ard_hierarchical(): https://insightsengineering.github.io/tlg-catalog/stable/tables/adverse-events/aet01_aesi.html#output

Honestly, we could probably simplify the hierarchical functions by implementing this upstream in cards (or internally) but there are several common tables where this would be useful.

@ddsjoberg
Copy link
Collaborator

Hmmm, before we merge this in, can we brainstorm together a bit? Either today or next week?

@ddsjoberg
Copy link
Collaborator

Would another way to calculate these quantities be:

ADAE |> 
  dplyr::slice_max(AESEV, n = 1, with_ties = FALSE, by = USUBJID) |> 
  cards::ard_categorical(
    by = TRTA, 
    variables = AESEV,
    denominator = ADSL |> dplyr::select(USUBJID, TRTA = ARM)
  )

@edelarua edelarua changed the title Add ard_event_rates() Add ard_categorical_max() Jan 9, 2025
@ddsjoberg ddsjoberg self-requested a review January 9, 2025 19:36
Copy link
Collaborator

@ddsjoberg ddsjoberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you!! Let's chat about the comments before you move forward making any changes.

R/ard_categorical_max.R Outdated Show resolved Hide resolved
R/ard_categorical_max.R Outdated Show resolved Hide resolved
R/ard_categorical_max.R Outdated Show resolved Hide resolved
R/ard_categorical_max.R Outdated Show resolved Hide resolved
R/ard_categorical_max.R Outdated Show resolved Hide resolved
R/ard_categorical_max.R Outdated Show resolved Hide resolved
R/ard_categorical_max.R Outdated Show resolved Hide resolved
R/ard_categorical_max.R Outdated Show resolved Hide resolved
R/ard_categorical_max.R Outdated Show resolved Hide resolved
@edelarua edelarua requested a review from ddsjoberg January 10, 2025 23:05
Copy link
Collaborator

@ddsjoberg ddsjoberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting close! I am confusing myself thinking about the dataset passed in denominator when it doesn't have all the by variables. I think it may result in incorrect values.... 🤷🏼

call = get_cli_abort_call()
)
}
if (is_empty(denominator)) denominator <- data
Copy link
Collaborator

@ddsjoberg ddsjoberg Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're doing this, should we just make the denominator default value denominator = data in the function definition ? Then it will be more clear to users what the default is

function(x) {
ard_categorical(
data = data |>
cards:::arrange_using_order(c(id, by, x)) |>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, CRAN won't allow the :::, so we'll have to copy the function into cardx. But please add a note that it's copied from cards and the reason we are using it (so we don't forget!)

ard_categorical(
data = data |>
cards:::arrange_using_order(c(id, by, x)) |>
dplyr::slice_tail(n = 1L, by = all_of(c(id, intersect(by, names(denominator))))),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part intersect(by, names(denominator)) is confusing me a bit. What if someone passes an integer as the denom? Then it will return NULL, and as a result, we would have dplyr::slice_tail(n = 1L, by = all_of(id)), but above we sorted by c(id, by, x) so the max value of x could have appeared in the first by level and not been sorted to the bottom. Right?

I am trying to think through the implications of this line.... 🤔 What if some specifies by=c("ARM", "SEX"), but the denominator only has 'SEX'? In the previous step, we've sorted by ID, then ARM, then SEX, the the variable. Then within ID and SEX, we're taking the last observation....Is that what we want? Should it be a requirement that the denom dataset has all the by variables when it is a data frame? (I am really not sure, so I am asking!) 😆

fmt_fn = fmt_fn,
stat_label = stat_label
) |>
list()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're in an lapply() we don't need to pipe this into list(), right?


# print default order of character variable levels ---------------------------
for (v in variables) {
if (is.character(data[[v]])) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be on the safe side, can we print this for all variables?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ARD function for counting patients
2 participants